I. Introduction


Table 1.Sample for 5 randomly chosen countries of the data set used in this study
Country agricultural_land_p_2016 food_index_2015 forest_area_p_2015
Costa Rica 34.45946 126.09 53.97571
Italy 43.23451 93.40 31.60740
Chile 21.17165 111.84 23.85237
Angola 47.47734 181.06 46.40732
Cyprus 12.15693 80.83 18.69048
Country population_growth_p_2015 aded_val_GDP_2015
Costa Rica 1.0869528 4.956675
Italy -0.0963761 2.065237
Chile 1.1777575 3.638883
Angola 3.4388507 9.122535
Cyprus 0.7521855 1.874272

II. Exploratory data analysis


Table 2: Summary for the percent of agricultural land in different countries, in 2016
n min median mean max sd
184 0.5576923 39.95677 38.88085 82.55971 21.85271
Figure 1. Distribution for the percent of agricultural land in different countries, in 2016

Figure 1. Distribution for the percent of agricultural land in different countries, in 2016

Figure 2. Distribution for the 2015 food production index for different countries

Figure 2. Distribution for the 2015 food production index for different countries

Figure 7.1. Interactive Scatterplot for the percent of agricultural land in different countries, in 2016 against their 2015 food production index. The red line is the best fit line. The blue curve is the Loess curve.

Figure 3. Distribution for the percent of forest area in different countries, in 2015

Figure 3. Distribution for the percent of forest area in different countries, in 2015

Figure 7.1. Interactive Scatterplot for the percent of agricultural land in different countries, in 2016 against their percent of forest area, in 2015. The red line is the best fit line. The blue curve is the Loess curve.

Figure 3. Distribution for the percent annual population growth for different countries in 2015.

Figure 3. Distribution for the percent annual population growth for different countries in 2015.

Figure 7.1. Interactive Scatterplot for the percent of agricultural land in different countries, in 2016 against their percent annual population growth in 2015. The red line is the best fit line. The blue curve is the Loess curve.

Figure 3. Distribution for the Added value of Agriculture, forestry, and fishing to the GDP of different countries, in 2015

Figure 3. Distribution for the Added value of Agriculture, forestry, and fishing to the GDP of different countries, in 2015

Figure 7.1. Interactive Scatterplot for the percent of agricultural land in different countries, in 2016 against the added value of Agriculture, forestry, and fishing to their GDP in 2015. The red line is the best fit line. The blue curve is the Loess curve.


III. Multiple linear regression

i. Methods


## 
## Call:
## lm(formula = agricultural_land_p_2016 ~ ns(food_index_2015, df = 4) + 
##     ns(population_growth_p_2015, df = 4) + forest_area_p_2015 + 
##     ns(aded_val_GDP_2015, df = 4), data = tidy_joined_dataset)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -53.655 -11.874  -0.442  13.314  39.414 
## 
## Coefficients:
##                                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                            21.68221   19.34390   1.121 0.263919    
## ns(food_index_2015, df = 4)1           19.45436   13.08654   1.487 0.138975    
## ns(food_index_2015, df = 4)2           14.41294   10.28967   1.401 0.163121    
## ns(food_index_2015, df = 4)3           28.95063   30.29454   0.956 0.340612    
## ns(food_index_2015, df = 4)4           -3.06774   11.46043  -0.268 0.789269    
## ns(population_growth_p_2015, df = 4)1  -6.22456   11.12223  -0.560 0.576455    
## ns(population_growth_p_2015, df = 4)2   7.96660   10.34965   0.770 0.442519    
## ns(population_growth_p_2015, df = 4)3 -35.10175   25.60437  -1.371 0.172205    
## ns(population_growth_p_2015, df = 4)4 -51.68574   14.33979  -3.604 0.000411 ***
## forest_area_p_2015                     -0.40591    0.06205  -6.542 6.86e-10 ***
## ns(aded_val_GDP_2015, df = 4)1         17.02739    6.18359   2.754 0.006535 ** 
## ns(aded_val_GDP_2015, df = 4)2         12.50304    8.91819   1.402 0.162747    
## ns(aded_val_GDP_2015, df = 4)3         47.21576   14.67902   3.217 0.001553 ** 
## ns(aded_val_GDP_2015, df = 4)4         12.00965   14.59038   0.823 0.411592    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.77 on 170 degrees of freedom
## Multiple R-squared:  0.3144, Adjusted R-squared:  0.262 
## F-statistic: 5.996 on 13 and 170 DF,  p-value: 4.052e-09
Figure 14. Normal Q-Qplot for the percent of agricultural land in different countries, in 2016

Figure 14. Normal Q-Qplot for the percent of agricultural land in different countries, in 2016

Figure 15. Residuals distribution for the statistical model

Figure 15. Residuals distribution for the statistical model

Figure 16. Residuals graph for the fitted values, with a Lowess curve in blue and a horizontal line at zero in red.

Figure 16. Residuals graph for the fitted values, with a Lowess curve in blue and a horizontal line at zero in red.

Figure 17. Residuals graph for the food production Index, with a Lowess curve in blue and a horizontal line at zero in red.

Figure 17. Residuals graph for the food production Index, with a Lowess curve in blue and a horizontal line at zero in red.

Figure 18. Residuals graph for the percent of forest area in different countries, in 2015, with a Lowess curve in blue and a horizontal line at zero in red.

Figure 18. Residuals graph for the percent of forest area in different countries, in 2015, with a Lowess curve in blue and a horizontal line at zero in red.

Figure 18. Residuals graph for the percent annual population growth for different countries in 2015, with a Lowess curve in blue and a horizontal line at zero in red.

Figure 18. Residuals graph for the percent annual population growth for different countries in 2015, with a Lowess curve in blue and a horizontal line at zero in red.

Figure 18. Residuals graph for the Added value of Agriculture, forestry, and fishing to the GDP of different countries, in 2015, with a Lowess curve in blue and a horizontal line at zero in red.

Figure 18. Residuals graph for the Added value of Agriculture, forestry, and fishing to the GDP of different countries, in 2015, with a Lowess curve in blue and a horizontal line at zero in red.

Table 3: VIF table
GVIF Df GVIF^(1/(2*Df))
ns(food_index_2015, df = 4) 1.616606 4 1.061880
ns(population_growth_p_2015, df = 4) 1.964128 4 1.088043
forest_area_p_2015 1.100039 1 1.048827
ns(aded_val_GDP_2015, df = 4) 2.018545 4 1.091767

ii. Model Results and Interpretation


## lm(formula = agricultural_land_p_2016 ~ ns(food_index_2015, df = 4) + 
##     ns(population_growth_p_2015, df = 4) + forest_area_p_2015 + 
##     ns(aded_val_GDP_2015, df = 4), data = tidy_joined_dataset)
Table 4. Model Summary Table
Estimate Std. Error t value Pr(>|t|)
(Intercept) 21.68221 19.34390 1.12088 0.26392
ns(food_index_2015, df = 4)1 19.45436 13.08654 1.48659 0.13897
ns(food_index_2015, df = 4)2 14.41294 10.28967 1.40072 0.16312
ns(food_index_2015, df = 4)3 28.95063 30.29454 0.95564 0.34061
ns(food_index_2015, df = 4)4 -3.06774 11.46043 -0.26768 0.78927
ns(population_growth_p_2015, df = 4)1 -6.22456 11.12223 -0.55965 0.57645
ns(population_growth_p_2015, df = 4)2 7.96660 10.34965 0.76975 0.44252
ns(population_growth_p_2015, df = 4)3 -35.10175 25.60437 -1.37093 0.17220
ns(population_growth_p_2015, df = 4)4 -51.68574 14.33979 -3.60436 0.00041
forest_area_p_2015 -0.40591 0.06205 -6.54190 0.00000
ns(aded_val_GDP_2015, df = 4)1 17.02739 6.18359 2.75364 0.00653
ns(aded_val_GDP_2015, df = 4)2 12.50304 8.91819 1.40197 0.16275
ns(aded_val_GDP_2015, df = 4)3 47.21576 14.67902 3.21655 0.00155
ns(aded_val_GDP_2015, df = 4)4 12.00965 14.59038 0.82312 0.41159
Value df
Residual Standard Error 18.773 170
Multiple R-squared 0.314
Adjusted R-squared 0.262
Value Numerator df Denominator df
Model F-statistic 5.996 13 170
P-value 4.052e-09

iii. Inference for multiple regression

Table 5. ANOVA Table
Df Sum Sq Mean Sq F value Pr(>F)
ns(food_index_2015, df = 4) 4 4401.079 1100.2697 3.1218 0.0165
ns(population_growth_p_2015, df = 4) 4 4768.515 1192.1286 3.3825 0.0108
forest_area_p_2015 1 13639.881 13639.8812 38.7009 0.0000
ns(aded_val_GDP_2015, df = 4) 4 4665.087 1166.2718 3.3091 0.0122
Residuals 170 59915.433 352.4437 NA NA
Df Sum Sq Mean Sq F value Pr(>F)
ns(food_index_2015, df = 4) 4 4401.079 1100.2697 3.121831 0.0164792
ns(population_growth_p_2015, df = 4) 4 4768.515 1192.1286 3.382465 0.0108242
forest_area_p_2015 1 13639.881 13639.8812 38.700877 0.0000000
ns(aded_val_GDP_2015, df = 4) 4 4665.087 1166.2718 3.309101 0.0121861
Residuals 170 59915.433 352.4437 NA NA

IV. Discussion

i. Conclusions

ii. Limitations

iii. Further questions


V. Citations and References